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Abstract 

Belief Propagation (BP) decoding of LDPC codes is 
extended to the case of Joint Source-Channel coding. 
The uncompressed source is treated as a Markov pro- 
cess, characterized by a transition matrix, T, which is 
utilized as side information for the Joint scheme. The 
method is based on the ability to calculate a Dynami- 
cal Block Prior (DBP), for each decoded symbol sepa- 
rately, and re-estimate this prior after every iteration of 
the BP decoder. We demonstrate the implementation 
of this method using MacKay and Neel's LDPC algo- 
rithm over GF(q), and present simulation results in- 
dicating that the proposed scheme is comparable with 
Separation scheme, even when advanced compression 
algorithms (such as AC, PPM) are used. The exten- 
sion to 2D (and higher) arrays of symbols is straight- 
forward. The possibility of using the proposed scheme 
without side information is briefly sketched. 



1. INTRODUCTION 

The Shannon separation theorem QJ|2], states that 
source coding and channel coding can be performed 
separately and sequentially, while maintaining optimal- 
ity. However, this is true only in the case of asymptot- 
ically long blocks of data. Thus, considerable inter- 
est has developed in various schemes of joint source- 
channel coding, where the inherent redundancy of the 
source is utilized for error correction, possibly with the 
aid of some side information (see, for instance, [3j)- 
Combining the two processes may be motivated by re- 
ducing the total complexity of the procedure, and by 
some gain in the overall performance. Moreover, some 
uncompressed files (e.g. bitmap, text) are expected to 
be resilient to single bit errors, which may corrupt en- 
tire blocks in the case of the Separation scheme. 

Shannon's lower bound for the channel capacity of 
a binary symmetric channel (BSC) with flip probability 
/, bit error rate pb and source entropy H(src) per bit 



is given by pQ: 
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H {src) - H 2 ( Pb y 



where H2 (x) = —xlog2 (x) — (1 — x) log 2 (1 — x), is the 
entropy of x, and the capacity, C, is the maximal ratio 
between the source length k and the transmitted length 
m. 

In this paper we propose an extension of the Low- 
Density-Parity-Check codes (LDPC) 0] decodeing al- 
gorithm, primarily designed for i.i.d. sequences, to the 
case of uncompressed data. Our approach is to regard 
the source sequence, {s n }, as driven from some memo- 
ryless stationary Markov process with a finite alphabet 
s n G {0, 1, 2. ..q — 1}, and transition matrix T of dimen- 
sions q x q, that describes the probability of transition 
from symbol i to symbol j: tij = P(s n+ i = j \ s n = i). 
The Markov Entropy (per symbol) of such a process is 
given by: 

H = -Y t P(i)J2P(j I i)log 2 [P(j I i)], (2) 
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where P(i) is the stationary solution of the Markov 
process. The entropy per bit, H/ log 2 q, (log 2 q being 
the number of bits in the binary representation of a 
symbol) can be utilized as H(src) in Eq. QJ. 

Neighboring symbols in a Markov sequence are cor- 
related. Hence, information about symbols s n _i and 
s„+i, immediately implies some knowledge about s n , 
too. The main contribution of this work, is a method 
of incorporating this additional knowledge into the Be- 
lief Propagation decoding scheme. 

2. MN ALGORITHM 

Our joint source-channel scheme is based on Mackay 
and Neel's algorithm (a thorough introduction may be 
found in 5 ), a variant of the earlier Gallager code 
I3J. Although originally proposed for the binary field, 
extending the MN algorithm to higher finite fields is 
straight-forward as demonstrated in [Hj. The original 




Figure 1: The Dynamic Block Prior visualized as a 
third layer attached to the bipartite graph of the LDPC 
code. 

motivation for moving to higher fields was reducing the 
number of edges (and short loops) in the code's graph. 
For our purpose, this enables us to treat Markov se- 
quences with a richer alphabet, consisting of q = 2 l 
symbols (i being an integer). The algorithm consists 
of two sparse matrices known both to the sender and 
the receiver: A(m x fc), and B(m x m), where k is the 
source block length, m is the transmitted block length, 
and the code rate being R — k/m. All non-zero ele- 
ments in A and B are from {1, 2...q — 1}, and B must 
be invertible. Encoding of a source vector s into a 
codeword t is performed (all operations are done over 
GF{q)) by: 

t = B- 1 -A-s. (3) 

t is converted to binary representation and transmitted 
over the channel. During transmission, noise n is added 
to t, therefore the received vector is r = t + n. Upon 
receipt, the decoder reconverts r back to the original 
field, and computes the syndrome vector z = B-r. The 
receiver then faces the following decoding problem: 

z = B ■ (t + ri) = B ■ (B^ 1 ■ A - s + n) = [AB] ■ x, (4) 

where square brackets denote appending of matrices, 
and a: is a concatenation of s and n. The decoding 
problem can be visualized as a bipartite graph (Fig^), 
the elements of x (circles) and z (squares) are termed 
"variable" and "check" nodes, respectivly. The edges 
of the graph correspond to the nonzero elements in 
[AB]. For the MN algorithm, one should further dis- 
tinguish between "source variables" - the s elements 
in x (filled circles), and "noise variables" - the n el- 
ements in x (empty circles). The decoding problem 
may be solved using the Belief Propagation (BP) (or 
sum - product) algorithm E|. BP is an iterative 



algorithm with two alternating steps, horizontal pass 
(check—* variable messages) and vertical pass (variable 
— > check messages). During the vertical pass, some 
prior knowledge is assigned to each decoded symbol, 
according to the assumed statistics (for the i.i.d. case 
this would simply be: Pr(s = j) = 1/q for all the 
source symbols). The key point here is that one can 
re-estimate and re-assign these priors after every it- 
eration individually for each decoded symbol [7|. The 
outcome of each iteration is an a-posteriori probabil- 
ity Qi — Pr(Xi — a), for each symbol (both source 
and noise). The MN decoder is linear in the size of 
the source block, k, with complexity O(kqu) (per it- 
eration), where u is the average number of checks per 
symbol [HUH- 

A proper construction of the matrices A and B is 
crucial in order to ensure nearly capacity-achieving per- 
formance. In this work we follow the Kanter and Saad 
(KS) constructions ^3^J, which are very sparse, sim- 
ple to construct, and preform very close to the bound. 
The B matrix has a systematic construction: diagonal 
and sub diagonal, which simplifies computation tasks 
[Hlini US- The KS construction for R = 1/3, GF(2), 
is schematically displayed in Fig. [21 black regions de- 
note nonzero elements. Extending the construction 
to higher GF(q) is done by randomly replacing the 
nonzero elements with elements of the corresponding 
field. Although constructed originally for i.i.d. sources, 
we successfully used KS matrices for uncompressed 
sources, however, we mention the possibility of improv- 
ing the performance by devising better codes. 

The MN algorithm is also applicable for the case 
of an Additive White Gaussian Noise (AWGN) chan- 
nel The binary transmitted vector (assumed for 
simplicity to be ±1) is corrupted by noise with zero 
mean and variance <r 2 , hence, the received vector, r R , 
is real valued. The binary received vector, r, is deter- 
mined by hard-decision, namely, = +1 if rf" > 0. 
The probability of the transmitted bit £j = ±1 is given 



by: 

P(U = ±l\r?) = 



-(U-rf) 2 /2a 2 



e -(ti-r?)* /2<r* + e -(t«+rf ) 2 /2<r 
1 
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(5) 



and the probability of an error in the i th hard-decision 
bit is given by: 
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l + e 2|rf|/^ 



(6) 



Eq. © is used for calculating a prior for each noise 
variable. Apart of these modifications, the MN algo- 
rithm for an AWGN channel is identical to the BSC 
case. 

The channel capacity for AWGN is given by [2]: 
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log(l + ^r) 



(7) 



2 • H(src) 

For binary source messages, (rather then real source 
messages), however, there exist a tighter bound, |11|: 



C = 



(- / dyP(y) log P(y) 



H(src) 

dyP(y\x = x ) logP(y\x = x Q )) (8) 



where x is the transmitted bit, xq = ±1 and y is the 
received (corrupted) bit, with 



P{y) = 



i 
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e -(y-x) 2 /2a 2 + e -(y+x) 2 /2a 2 



(9) 



3. DERIVING THE DYNAMICAL BLOCK 
PRIORS 

In every iteration of the MN algorithm, a better es- 
timate of each variable node is attained (on average). 
In this section we shall describe our method of incorpo- 
rating the statistical knowledge about the source, and 
these local estimates. Consider three successive sym- 
bols s„_i, s„, s„+i in a sequence generated by a Markov 
process with transition matrix T and alphabet GF(q). 
The probability of a triplet a, b, c is given by |13| : 



P{a,b,c) = P(a,b) ■ P(c\a,b) = P(a,b) ■ P{c\b) 
P(a,b)P(b,c) 



P(b) 



(10) 



where use has been made of the Bayes Rule: P(x, y) = 
P(x) ■ P(y\x), and the fact that the process is mem- 
oryless. Now, given the a-posteriori probabilities for 



the first and last symbols in the triplet: Q„-i = 
Pr(s„_i = a) and Qn+i = Pr(s n +i = c), one can 
calculate a prior for the probability that s n = b: 



Pr(Sn = b) = | • P ( a > b > C ) ' Qn-l ■ Qn+l = 

a,c— 1 

= \ P ib)- l (^P{a.b)Ql-^ (^P{b,c)Q c n ^ , (11) 

where Z is a normalization constant such that: 
ELi Pr ( s n = b) = 1. We term Eq. JTTJ the Dynami- 
cal Block Prior (DBP). 

The extension of the MN algorithm to the joint 
source-channel case consists of the following steps: 

1 . A binary sequence of k ■ log2 (q) bits is converted 
to k GF(q) symbols. 

2. The encoder measures T and P(a) for all the q 
symbols over the source, and transmits reliably 
this side information to the decoder. 

3. The source is encoded according to JSJl, then re- 
converted to binary representation and transmit- 
ted over the BSC. 

4. The decoder maps the received signal back to 
GF(q), and performs the regular decoding (@J, 
but after every iteration of the BP, the prior for 
each source symbol is recalculated according to 

CD. 

The complexity of calculating the q priors for a sin- 
gle symbol according to the posteriors of its neighbors 
is reduced from q 3 in the naive calculation, to q 2 by 
Eq. The decoder's complexity remains linear, 

with total complexity of 0(kqu + kq 2 ) per iteration. 
The above-mentioned procedure may be thought of as 
adding a layer to the bipartite random graph repre- 
sented by the matrix [AB\. The DBP's, Eq. ((TTJ) . 
are messages passed only among source variable nodes, 
which are spatially related. In Fig. ^ the diamonds 
represent this new (directional) layer, which connects 
neighboring source nodes. We note that the possibility 
of extending this scheme to Gallager codes is an open 
question, since the source is not explicitly represented 
in the graph. 

4. SIMULATION RESULTS 

We report here results for a BSC with rate R — 1/3, 
and for an AWGN with rate 1/4, using the correspond- 
ing KS constructions for A and B devised in [TU1 ITT] . 
Other rates, constructions and block length were also 



Table 1: Simulation results for BSC with rate 1/3 



q 


k 


H 


fsh 


fc 


foo 


4 


5000 


0.49 


0.266 


0.215 


0.244 


8 


3333 


0.471 


0.271 


0.223 


0.243 


16 


2500 


0.49 


0.266 


0.21 


0.236 



checked. Random vectors of length L — 10 4 bits (9, 999 
for q = 8) were generated by the Markov process, then 
mapped toavector in GF(q) with length fc — L/log2[q], 
and were encoded and decoded as described in the pre- 
vious section. For each reported result, at least 1000 
sample vectors were generated and transmitted. 

4.1. Estimating The Code's Threshold 

The threshold for infinite source length, fc — > oo, 
is estimated from the scaling argument of the conver- 
gence time, which was previously observed for q = 2 
jlOl 111) . The convergence time, measured in iterations 
of the MN algorithm, is assumed to diverge as the level 
of noise approaches the threshold from below. More 
precisely, we found that the scaling for the divergence 
of t me d is independent of q and is consistent with: 



tmed(f) °C 



1 



/oo-/ 



tmedicr) OC 



1 



(12) 



for a BSC, and an AWGN channel, respectivly. 

This extrapolation is independent of fc ^1] (for 
fc >> 1), so by monitoring t rne d, for moderate fc, the 
threshold can be found by a linear fit. (see the inset 
of Fig. [3J Note that the estimation of t me d is a sim- 
ple computational task in comparison with the estima- 
tion of low bit error probabilities for large fc, especially 
close to the threshold. We also note that the analysis 
is based on t me d instead of the average number of it- 
erations, since we wish to prevent the dramatic effect 
of a small fraction of samples with slow convergence or 
no convergence. 
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Figure 3: Bit error rate pb vs noise level /, triangles 
for q = 8, squares for 5 = 4, filled symbols: use of all 
elements of T as side information, empty symbols: use 
of only q dominant elements of T. Inset: scaling be- 
havior for median convergence time tmed- /oo = 0.242, 
is found by linear fit. 



two advanced compression algorithms: Prediction by 
Partial Match (PPM) [TH], and Arithmetic Coder (AC) 
|17j . In Table |21 we preset the compression ratio of the 
source vectors using each method (%AC, %PPM); 
the ratio between the compressed sequences and the 
transmitted blocks, to, (for the comparison, we use the 
same transmitted size as for the Joint scheme, to = 3fc); 
and the maximal noise level for this new rate for i.i.d. 
source, fAC,fpPM- Since we assume an optimal de- 
coder, these noise levels should be compared to foo for 
the joint scheme. In all cases, the threshold of the 
proposed Joint scheme for fc — > oo is comparable with 
the Separation scheme. One should recall that these 
results may be improved by advanced codes. In Fig. 
13 pb is plotted against the noise level of the channel, 
/, for the examples of Tabled Filled triangles repre- 
sent q — 8; filled squares represent q = 16; the empty 
symbols in this figure refer to an approximation that 
will be described in the following section. The dashed 
(full) arrow marks fsh for q = 8 (16). The inset of Fig. 
13 demonstrates the extrapolation of foo from the con- 
vergence time Eq. l(T2l : for q = 8, f(t me d) is plotted 
against l/t me d, foo is then recovered by a linear fit. 



4.2. BSC Simulations 

Some selected results are presented in Tabled The 
columns correspond to: the field size q; the source 
length in symbols, fc; the entropy (per bit) of the source 
H ^5] ; and the corresponding maximal noise fsh (Eq. 
QJ); the critical noise, f c , up to which the bit error 
rate pb < 10~ 5 ; and the threshold, foo, Eq. (fT2"|) . 

In order to compare the joint and the Separation 
schemes, the generated samples were concatenated to 
strings of size L — 10 5 — 10 6 bits, and compressed using 



4.3. AWGN Simulations 

Binary sequences of length fc = 10 4 were generated 
using the following transition matrix: 

/ 0.89 0.11 \ 
~ ^ 0.11 0.89 J ' 

having Markov Entropy H(src) — 0.5. For rate R = 
1/4, this entropy corresponds to maximal noise, Eq. 
©, a S h = 2.298, (-4.2 Db). The sequences were 



Table 2: Critical noise level for Separation scheme us- 
ing Arithmetic Coder and Prediction by Partial Match 
compression algorithms. ] ac an d fppM should be 
compared to foe in Tabled 

q I %AC I R AC I fAC I %PPM I Rppm I fpPM 

4 58.4% 0.195 0.247 56.6% 0.189 0.25 
8 58.1% 0.194 0.248 55.5% 0.185 0.253 
16 60.5% 0.201 0.243 59.4% 0.198 0.245 



order O(logfc). This overhead is especially intolerable 
in the limit where: g 2 log(fc)/fc ~ O(l). Note that this 
is indeed the situation even for very large messages, 
k = 10 6 , and a symbol size of 8 bits (a "char", q = 256). 

This point may be tackled by observing that for a 
process with low entropy, characterized by enhanced 
repetitions and correlations, T is dominated by a small 
number of elements, while the rest of the elements are 
negligible. We therefore repeated our simulations, us- 
ing only the q largest elements in T as side information. 
The decoder would then set all other elements in each 
row of T equally, to obey the normalization condition 
J2j Tij = 1. In Fig. the empty squares/ triangles rep- 
resent working points for the algorithm with q — 8/16. 
In both cases, the critical noise level f c is only slightly 
decreased, but the size of the side information becomes 
considerably smaller. 

6. JOINT SOURCE-CHANNEL CODING 
WITH THE LACK OF SIDE INFORMATION 

In this section we describe how the Markovian de- 
coder can be implemented without any transmission of 
side information. The key points are the special prop- 
erties of the KS construction (Fig. EJ: the first k rows 
of A are characterized by one non-zero element per row 
and column, where the first k rows of B are character- 
ized by 2 non-zero elements. Furthermore, due to the 
systematic form of B, each row cannot be written as a 
linear combination of the other rows. Hence, the first k 
bits of the syndrome vector z, are equal (up to a simple 
permutation) to the source, with an effective flip rate, 
f e a . For GF(2) for instance, Zj — + rij + Uj+i (i 

marks the position of the nonzero element in the j th 
row of A), and f £ jj = 2/(1 — /). The first k symbols 
of z are therefore a result of a hidden Markov Model 
(HMM). The underlaying transition matrix, T, gener- 
ating the source sequence, can be estimated by means 
of the EM algorithm ^Hj , which is a standard tool 
for solving such Parametric Estimation problems, with 
linear complexity. Having T (approximately) revealed, 
the DBP's can be calculated as described in Eq. Ulljl. 

For the general construction of the MN algorithm 
one adds/subtracts rows of the concatenated matrix 
[AB] and the corresponding symbols in z, such that a 
situation is finally reached as follows: The first k rows 
of A are the identity matrix, regardless of the construc- 
tion of the first k rows of B. From the knowledge of 
the noise level / and the structure of i th row of B one 
can now calculate the effective noise, /j e jj , of the i th 
received source symbol. Since all {/j e jj\ are functions 
of a unique noise level /, one can estimate the param- 
eters of the Markovian process using some variants of 
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Figure 4: Scaling behavior for AWGN channel: a is 
plotted vs. X/tmed for various field sizes, q. 



transmitted over an AWGN channel, using three dif- 
ferent fields: GF{2), GF(4), GF{8). Fig. H presents 
the scaling behavior, Ea. (|12(l . for these fields (trian- 
gles, squares, and circles, respectively). The symbols 
mark working points with pi, < 10~ 5 , and were used 
for estimating the corresponding thresholds: a^q = 
2) = 2.08, aeoiq = 4) = 2.14, a^q = 8) = 2.17. It 
is evident that as q increases, both u c (q) and a oa (q) 
improve. 

5. REDUCING THE AMOUNT OF SIDE IN- 
FORMATION 



The results of the previous sections indicate that the 
performance of the presented joint coding is not too 
far from Shannon's lower bound and, most probably, 
using an optimized code, the channel capacity can be 
nearly saturated. However, for a finite block length, 
the main drawback of our algorithm is the overhead 
of the header (i.e. the transmitted side information) 
which must be encoded and transmitted reliably. One 
has to remember that the size of the header, (T), scales 
with q 2 where the precision of each element is of the 



the EM algorithm. Note, that in the general case the 
first k rows of B contain loops, hence {/j e jj}, are cor- 
related. However, these correlations are assumed to be 
small as the typical loop size is of 0(log(fc))H?5]. 

7. CONCLUDING REMARKS 

The only remaining major drawback of the pre- 
sented decoder is that the complexity (per iteration), 
scales as 0(kq 2 ), this may considerably slow down the 
decoder even for moderate alphabet size. Note how- 
ever, that for large q, such that q 2 > k, and low entropy 
sequences, the transition matrix, T, is expected to be 
very sparse, and dominated by elements of 0(1). Tak- 
ing advantage of the sparseness of T, the complexity of 
the decoder can be further reduced. 

The one-dimensional Markovian decoder can be 
easily extended to coding of a two-dimensional array 
of symbols or even to an array of symbols in higher 
dimensions ^3] ■ The naive complexity of the DBP cal- 
culation scales as k d q 2d+1 , where k d is the number of 
blocks in the array, and d denotes the dimension. Using 
Markovian and Bayesian assumptions, the complexity 
can be reduced to 0(k d q 2 ). 
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